The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
与2D车道相比,实际3D车道数据很难准确收集。在本文中,我们提出了一种仅使用2D车道标签训练3D车道的新方法,称为弱监督的3D车道检测WS-3D车道。通过在相邻车道上的恒定车道宽度和相等高度的假设,我们间接监督训练中的3D车道高度。为了克服数据收集过程中相机音调动态变化的问题,提出了相机音调自校准方法。在锚固表示中,我们提出了一个具有改进的非限量抑制(NMS)方法的双层锚,该方法使基于锚的方法可以预测两条接近的车道线。实验是在两种监督方法下在3D-LANENEN的基础上进行的。在弱监督的环境下,我们的WS-3D车道的表现优于先前的3D-LANEN:APOLLO 3D合成数据集的F得分上升到92.3%,而F1在3DDLANES上上升到74.5%。同时,在纯监督环境中的WS-3D车道可以提高更多的增量,并且优于最先进的设置。据我们所知,WS-3D车道是在弱监督环境下进行3D车道检测的第一次尝试。
translated by 谷歌翻译
随着自我监督学习的快速发展(例如,对比度学习),在医学图像分析中广泛认识到具有大规模图像(即使没有注释)来训练更具概括的AI模型的重要性。但是,大规模收集大规模任务的未注释数据对于单个实验室来说可能具有挑战性。现有的在线资源(例如数字书籍,出版物和搜索引擎)为获取大型图像提供了新的资源。然而,在医疗保健中发布的图像(例如放射学和病理学)由大量的带有子图的复合图组成。为了提取和分离化合物形象为下游学习的可用单个图像,我们提出了一个简单的复合图分离(SIMCFS)框架,而无需使用传统所需的检测边界框注释,并具有新的损失函数和硬案例模拟。我们的技术贡献是四倍:(1)我们引入了一个基于模拟的培训框架,该框架最小化了对资源广泛的边界框注释的需求; (2)我们提出了一种新的侧损失,可针对复合人物分离进行优化; (3)我们提出了一种阶层内图像增强方法来模拟硬病例; (4)据我们所知,这是第一项评估利用复合图像分离的自我监督学习功效的研究。从结果来看,提出的SIMCF在ImageClef 2016复合人物分离数据库上实现了最先进的性能。使用大规模开采数字的预审预革的学习模型通过对比度学习算法提高了下游图像分类任务的准确性。 SIMCF的源代码可在https://github.com/hrlblab/imageseperation上公开获得。
translated by 谷歌翻译
旨在从文本中检测事件并对其进行分类的事件检测(ED)对于理解现实生活中的实际情况至关重要。但是,主流事件检测模型需要触发器的高质量专家人类注释,这通常是昂贵的,因此阻止了ED在新领域的应用。因此,在本文中,我们专注于无触发器的低资源,并旨在应对以下艰巨的挑战:多标签分类,线索不足和事件分布不平衡。我们通过机器阅读理解(DRC)框架提出了一种新颖的无触发ED方法。更具体地说,我们将输入文本视为上下文,并将其与所有事件类型的令牌相连,后者被视为答案,并忽略了默认问题。因此,我们可以利用预训练的语言模型中的自我发作来吸收输入文本和事件类型之间的语义关系。此外,我们设计了一个简单而有效的事件毁灭模块(EDM),以防止大型事件过度学习,从而产生更平衡的训练过程。实验结果表明,我们提出的无触发ED模型与基于主流触发器的模型非常有竞争力,显示了其在低源事件检测上的强劲性能。
translated by 谷歌翻译
视觉表示学习是解决各种视力问题的关键。依靠开创性的网格结构先验,卷积神经网络(CNN)已成为大多数深视觉模型的事实上的标准架构。例如,经典的语义分割方法通常采用带有编码器编码器体系结构的完全横向卷积网络(FCN)。编码器逐渐减少了空间分辨率,并通过更大的接受场来学习更多抽象的视觉概念。由于上下文建模对于分割至关重要,因此最新的努力一直集中在通过扩张(即极度)卷积或插入注意力模块来增加接受场。但是,基于FCN的体系结构保持不变。在本文中,我们旨在通过将视觉表示学习作为序列到序列预测任务来提供替代观点。具体而言,我们部署纯变压器以将图像编码为一系列贴片,而无需局部卷积和分辨率减少。通过在变压器的每一层中建立的全球环境,可以学习更强大的视觉表示形式,以更好地解决视力任务。特别是,我们的细分模型(称为分割变压器(SETR))在ADE20K上擅长(50.28%MIOU,这是提交当天测试排行榜中的第一个位置),Pascal环境(55.83%MIOU),并在CityScapes上达到竞争成果。此外,我们制定了一个分层局部全球(HLG)变压器的家族,其特征是窗户内的本地关注和跨窗户的全球性专注于层次结构和金字塔架构。广泛的实验表明,我们的方法在各种视觉识别任务(例如,图像分类,对象检测和实例分割和语义分割)上实现了吸引力的性能。
translated by 谷歌翻译
多年来,使用单点监督的对象检测受到了越来越多的关注。在本文中,我们将如此巨大的性能差距归因于产生高质量的提案袋的失败,这对于多个实例学习至关重要(MIL)。为了解决这个问题,我们引入了现成建议方法(OTSP)方法的轻量级替代方案,从而创建点对点网络(P2BNET),该网络可以通过在中生成建议袋来构建一个互平衡的提案袋一种锚点。通过充分研究准确的位置信息,P2BNET进一步构建了一个实例级袋,避免了多个物体的混合物。最后,以级联方式进行的粗到精细政策用于改善提案和地面真相(GT)之间的IOU。从这些策略中受益,P2BNET能够生产出高质量的实例级袋以进行对象检测。相对于MS可可数据集中的先前最佳PSOD方法,P2BNET将平均平均精度(AP)提高了50%以上。它还证明了弥合监督和边界盒监督检测器之间的性能差距的巨大潜力。该代码将在github.com/ucas-vg/p2bnet上发布。
translated by 谷歌翻译
Most recent semantic segmentation methods adopt a fully-convolutional network (FCN) with an encoderdecoder architecture. The encoder progressively reduces the spatial resolution and learns more abstract/semantic visual concepts with larger receptive fields. Since context modeling is critical for segmentation, the latest efforts have been focused on increasing the receptive field, through either dilated/atrous convolutions or inserting attention modules. However, the encoder-decoder based FCN architecture remains unchanged. In this paper, we aim to provide an alternative perspective by treating semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer (i.e., without convolution and resolution reduction) to encode an image as a sequence of patches. With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR). Extensive experiments show that SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes. Particularly, we achieve the first position in the highly competitive ADE20K test server leaderboard on the day of submission.
translated by 谷歌翻译
A crucial issue of current text generation models is that they often uncontrollably generate factually inconsistent text with respective of their inputs. Limited by the lack of annotated data, existing works in evaluating factual consistency directly transfer the reasoning ability of models trained on other data-rich upstream tasks like question answering (QA) and natural language inference (NLI) without any further adaptation. As a result, they perform poorly on the real generated text and are biased heavily by their single-source upstream tasks. To alleviate this problem, we propose a weakly supervised framework that aggregates multiple resources to train a precise and efficient factual metric, namely WeCheck. WeCheck first utilizes a generative model to accurately label a real generated sample by aggregating its weak labels, which are inferred from multiple resources. Then, we train the target metric model with the weak supervision while taking noises into consideration. Comprehensive experiments on a variety of tasks demonstrate the strong performance of WeCheck, which achieves a 3.4\% absolute improvement over previous state-of-the-art methods on TRUE benchmark on average.
translated by 谷歌翻译
Recent advances in operator learning theory have improved our knowledge about learning maps between infinite dimensional spaces. However, for large-scale engineering problems such as concurrent multiscale simulation for mechanical properties, the training cost for the current operator learning methods is very high. The article presents a thorough analysis on the mathematical underpinnings of the operator learning paradigm and proposes a kernel learning method that maps between function spaces. We first provide a survey of modern kernel and operator learning theory, as well as discuss recent results and open problems. From there, the article presents an algorithm to how we can analytically approximate the piecewise constant functions on R for operator learning. This implies the potential feasibility of success of neural operators on clustered functions. Finally, a k-means clustered domain on the basis of a mechanistic response is considered and the Lippmann-Schwinger equation for micro-mechanical homogenization is solved. The article briefly discusses the mathematics of previous kernel learning methods and some preliminary results with those methods. The proposed kernel operator learning method uses graph kernel networks to come up with a mechanistic reduced order method for multiscale homogenization.
translated by 谷歌翻译
Behavior constrained policy optimization has been demonstrated to be a successful paradigm for tackling Offline Reinforcement Learning. By exploiting historical transitions, a policy is trained to maximize a learned value function while constrained by the behavior policy to avoid a significant distributional shift. In this paper, we propose our closed-form policy improvement operators. We make a novel observation that the behavior constraint naturally motivates the use of first-order Taylor approximation, leading to a linear approximation of the policy objective. Additionally, as practical datasets are usually collected by heterogeneous policies, we model the behavior policies as a Gaussian Mixture and overcome the induced optimization difficulties by leveraging the LogSumExp's lower bound and Jensen's Inequality, giving rise to a closed-form policy improvement operator. We instantiate offline RL algorithms with our novel policy improvement operators and empirically demonstrate their effectiveness over state-of-the-art algorithms on the standard D4RL benchmark.
translated by 谷歌翻译